feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13
Merged
raullenchai merged 7 commits intomainfrom Mar 4, 2026
Merged
feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13raullenchai merged 7 commits intomainfrom
raullenchai merged 7 commits intomainfrom
Conversation
Port 3 upstream vLLM tool parsers for popular MLX models: - seed_oss: GPT-OSS-20B XML format (<seed:tool_call> + <seed:think>) - deepseek_v31: DeepSeek V3.1/R1-0528 unicode special tokens - qwen3_coder_xml: Qwen3-Coder XML format (<tool_call>/<function=...>) Includes 72 upstream regression tests and eval config updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix GLM47 test_streaming_no_tool_calls to match current strip_think_tags
behavior (strips leading whitespace from content deltas)
- Add multi-step streaming tests for seed_oss and qwen3coder that verify
header + { + params + } are all emitted across multiple calls
- Add note that run_all_models.sh paths are machine-specific
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix GLM47 streaming: strip_think_tags was eating inter-word spaces on
normal content deltas; now only strips when </think> is actually present
- Add multi-step streaming tests for seed_oss and qwen3coder that verify
complete tool call emission (header + { + params + }) with fine-grained
deltas matching realistic token boundaries
- Add note that run_all_models.sh paths are machine-specific
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Streaming completeness (seed_oss + qwen3coder): - When the function body is already complete at header-detection time, emit the full tool call (name + arguments) in one chunk instead of header-only. Prevents truncated output when coarse deltas or max_tokens leave no further parser calls. - When tool_call_start is detected, fall through to header parsing instead of returning None — the header may already be available. GLM47 streaming: - Only call strip_think_tags when </think> is actually present in the delta, preventing inter-word spaces from being eaten on normal content. Tests: - Add coarse-delta streaming tests that verify complete arguments are emitted even with a single large chunk (seed_oss + qwen3coder). - Fix GLM47 streaming test to expect preserved whitespace. Other: - Remove misleading MODEL_DIR env var reference from run_all_models.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The GPT-OSS chat template generates tool calls as: <|start|>assistant to=functions.NAME<|channel|>commentary json<|message|>ARGS<|call|> But the harmony regex expected: <|channel|>commentary to=functions.NAME <|message|>ARGS<|call|> The to=functions.NAME comes before <|channel|>commentary in reality, not after. This mismatch caused 17% tool calling score. Fix: support both formats (real + legacy test format) via alternation. Also accept <|end|> as final channel terminator alongside <|return|>. Revert GPT-OSS eval config from seed_oss back to harmony. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set HarmonyToolParser.SUPPORTS_NATIVE_TOOL_FORMAT = True so multi-turn
tool history uses native harmony tokens instead of plain text conversion
("[Calling tool: ...]"), which broke GPT-OSS tool flow understanding.
- Extend load_model_with_fallback to catch "Missing N parameters" errors
(not just "parameters not in model") for VLM-packaged models like
Qwen3.5-9B and Mistral-Small-3.2 that need strict=False.
- Update harmony and native format tests accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add explicit parentheses in tokenizer.py fallback condition to clarify `or`/`and` precedence (behavior was correct but ambiguous to read). - Fix _convert_param_value() in seed_oss and qwen3coder parsers: when schema says "number"/"float", always return float instead of silently coercing 3.0 → int(3). Removes lossy `fv - int(fv) != 0` check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
seed_oss,seed,gpt_oss): GPT-OSS-20B XML format with<seed:tool_call>+<seed:think>thinking blocksdeepseek_v31,deepseek_r1_0528): DeepSeek V3.1/R1-0528 unicode special tokens (simpler than V3 — no code fence, no type prefix)qwen3_coder_xml,qwen3_xml): Qwen3-Coder XML format with<tool_call>/<function=...>and parameter type conversionevals/README.mdserver flags table,evals/run_all_models.shGPT-OSS parserminimax→seed_ossMotivation
MLX download rankings show these are among the most popular models:
minimaxparser → only 17% tool calling scorehermesworks (90%) but upstream has dedicated XML parserTest plan
python3.12 -m pytest tests/test_upstream_regression.py -v— 72/72 passpython3.12 -m pytest tests/test_tool_parsers.py tests/test_minimax_tool_parser.py -v— 143/143 pass (no regressions)--tool-call-parser seed_oss(if server available)🤖 Generated with Claude Code